A Conceptual Framework for System Fault Tolerance
نویسندگان
چکیده
A major problem in transitioning fault tolerance practices to the practitioner community is a lack of a common view of what fault tolerance is, and how it can help in the design of reliable computer systems. This document takes a step towards making fault tolerance more understandable by proposing a conceptual framework. The framework provides a consistent vocabulary for fault tolerance concepts, discusses how systems fail, describes commonly used mechanisms for making systems fault tolerant, and provides some rules for developing fault tolerant systems.
منابع مشابه
رویکردی برای حفاظت از عملیات های پردازش داده در سیستم های محاسباتی با استفاده از کدهای کانولوشن
Abstract We present a framework for algorithm-based fault tolerance methods in the design of fault tolerant computing systems. The ABFT error detection technique relies on the comparison of parity values computed in two ways. The parallel processing of input parity values produce output parity values comparable with parity values regenerated from the original processed outputs. Number data proc...
متن کاملStability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid
Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...
متن کاملA Literature Review on Security Aspects for Fault Tolerance in Networks
Abstract: The security aspects and fault tolerance of the computational network provides have a crucial impact on the designing and use of networks. It provides the framework for securing the networks with the unwanted faults and to increases the reliability of the fault tolerance network. The conceptual view for provide secure the networks and diagnose the faults. In this paper we have carried...
متن کاملBuilding a Comprehensive Conceptual Framework for Power Systems Resilience Metrics
Recently, the frequency and severity of natural and man-made disasters (extreme events), which have a high-impact low-frequency (HILF) property, are increased. These disasters can lead to extensive outages, damages, and costs in electric power systems. A power system must be built with “resilience” against disasters, which means its ability to withstand disasters efficiently while ensuring the ...
متن کاملA Framework For Proactive Fault Tolerance12
Fault tolerance is a major concern to guarantee availability of critical services as well as application execution. Traditional approaches for fault tolerance include checkpoint/restart or duplication. However it is also possible to anticipate failures and proactively take action before failures occur in order to minimize failure impact on the system and application execution. This document pre...
متن کامل